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Abstract — A pragmatic coded modulation system is presented 
that incorporates signal shaping and exploits the excellent 
performance and efficient high-speed decoding architecture of 
staircase codes. Reliable communication within 0.62 bits/s/Hz 
of the estimated capacity (per polarization) of a system with 
L = 2000 km is provided by the proposed system, with an error 
floor below 10 -20 . Also, it is shown that digital backpropagation 
increases the achievable spectral efficiencies — relative to linear 
equalization — by 0.55 to 0.75 bits/s/Hz per polarization. 

Index Terms — Staircase codes, fiber-optic communications, dig- 
ital backpropagation, forward error correction, coded modula- 
tion, channel capacity. 



I. Introduction 

RECENT progress has been made in estimating the 
information-theoretic capacity of the class of fiber-optic 
communication systems that are (presently) of commercial 
interest fT), but existing systems perform far from the fun- 
damental limits of the channel. While signal processing and 
coded modulation techniques promise to eliminate this gap, 
their implementations — at the speeds present in fiber-optic 
systems — present significant challenges. 

Many existing proposals for coded modulation in fiber-optic 
communication systems amount to using techniques currently 
used in electrical wireline and wireless communication sys- 
tems. For example, in J3, the authors propose a concatenated 
coding system with inner trellis-coded modulation (for an 
8-PSK constellation) and an outer product-like code, and 
in (3j— O, the authors propose using low-density parity-check 
(LDPC) codes for coded modulation. In both cases, the propos- 
als are verified by simulation, where the channel is assumed 
to be a classical additive white Gaussian noise (AWGN), but 
no consideration is given to the real-world implementation 
challenges for the proposed systems. 

Other proposals for coded modulation in fiber-optic systems 
consider simplified channel models, and design codes for the 
resulting systems. For example, in |6j, the authors design 
a trellis-coded polarization-shift-keying modulation system, 
but their channel model only considers laser phase noise, 
i.e., effects related to the propagation over fiber are com- 
pletely ignored. In Q, the authors consider a nonlinear phase 
noise channel model studied by (U, and design a multi-level 
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coded modulation system with Reed-Solomon codes at each 
level. However, this channel model assumes a single-channel 
dispersion-less system, which is not of practical interest. 

In this paper, we take a pragmatic approach to coded mod- 
ulation for fiber-optic systems, that addresses the deficiencies 
of the aforementioned proposals. Due to the fact that product- 
like codes with syndrome-based decoding have efficient high- 
speed decoders [91, we consider systems with hard-decision 
decoding. Furthermore, the channel model for which the codes 
are designed is not a simplified one, but rather is derived 
from (computationally intensive) simulations of the fiber- 
optic systems based on the generalized nonlinear Schrodinger 
(GNLS) equation [(Q~|), below], and thus accurately models 
the non-AWGN channel that occurs in optical communication 
systems. 

In contrast to most classically studied communication chan- 
nels, optical fiber exhibits significant nonlinearity (in the 
intensity of the guided light) 1101 , Furthermore, amplification 
acts as a source of distributed AWGN, and fiber chromatic 
dispersion acts as a distributed linear filter. Complicating 
matters, these three fundamental effects interact over the 
length of transmission. In (TJ, signal processing is performed 
via digital backpropagation, in order to attempt to compensate 
the channel impairments. However, their results do not quan- 
tify the benefits of this compensation strategy. Since digital 
backpropagation is computationally expensive, one approach 
to reducing the computational burden is to increase the step- 
size of the algorithm, as in ifTTl . lfl2l . In this paper, we 
compare the achievable rates for two extreme cases: digital 
backpropagation (as in HI), and a linear equalizer (which can 
be considered as a form of "linear backpropagation" in which 
the step-size is the system's length). 

In Section [U] we review staircase codes, the system model 
for a fiber-optic communication system, and digital backprop- 
agation. In Section [TTH we compare the transmission rates 
that can be achieved using digital backpropagation with those 
achievable by linear equalization. In Section lTVl we present the 
details of a pragmatic coded modulation system, and compare 
the performance of the system to the capacity estimates. 

II. Preliminaries 

A. Staircase Codes 

Staircase codes |9| are a family of high-rate binary error- 
correcting codes suitable for high-speed fiber-optic commu- 
nications. Staircase codes can be interpreted as generalized 
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LDPC codes, that is, sparse graph-based codes whose con- 
straint nodes are error-correcting codes, not the single-parity- 
check error-detecting codes used in the constraint nodes 
of conventional LDPC codes. With such generalized LDPC 
codes, algebraic decoding can be applied at the constraint 
nodes, and the decoder can operate exclusively on syndromes. 
As discussed in |9], this significantly reduces the decoder data- 
flow (relative to a message-passing LDPC decoder), admitting 
an efficient high-speed implementation. Furthermore, due to 
the error correcting capabilities of the constraint nodes, stair- 
case codes have very low error floors, which can be estimated 
analytically. Finally, due to their structural properties, staircase 
codes provide superior performance to product codes. 

Staircase codes are completely characterized by the rela- 
tionship between successive matrices of symbols. Specifically, 
consider the (infinite) sequence Bo , B\ , B 2 , . ■ . of m-by-m 
matrices Bi, i £ Z + . Block Bq is initialized to a reference 
state known to the encoder-decoder pair, e.g., block B could 
be initialized to the all-zeros state, i.e., an m-by-m array 
of zero symbols. Furthermore, we select a conventional FEC 
code (e.g., Hamming, BCH, Reed-Solomon, etc.) in systematic 
form to serve as the component code; this code, which we 
henceforth refer to as C, is selected to have blocklength 2m 
symbols, r of which are parity symbols. 

Generally, the relationship between successive blocks in a 
staircase code satisfies the following relation: for any i > 1, 
each of the rows of the matrix \Bj_-i_BA is a valid codeword 
in C. Just as in a conventional product code, any given symbol 
in any given block Bi participates in two constraints: one to 
satisfy the condition that each row of VBj^BA is a codeword 
of C, and one to satisfy the condition that each row of 
[Bj ' Bi + {\ is a codeword of C. 

B. System Model 

We consider a coherent fiber-optic communication system. 
Between the transmitter and receiver, standard-single-mode 
fiber and ideal distributed Raman amplification are assumed, 
but we note that the methods presented herein also apply 
to alternate system configurations (e.g., systems with inline 
dispersion-compensating fiber, and/or lumped amplification). 
The complex baseband representation of the signal in a single 
polarization at the output of the transmitter is A(0,t), and 
at the input of the receiver is A(L, t), where L is the total 
system length; note that A(z,t) represents the full field, i.e., 
in general it represents co-propagating dense wavelength- 
division-multiplexed signals. 

The generalized non-linear Schrodinger (GNLS) equation 
expresses the evolution of A(z,t): 

dA j/3 2 d 2 A 
dz 



TABLE I 
System Parameter Values 



j 1 \A\ 2 A = n(z,t). 



(1) 



2 dt 2 

Since ideal distributed Raman amplification is assumed, the 
loss term has been omitted, and n(z,t) is a circularly sym- 
metric complex Gaussian noise process with autocorrelation 

£ [n(z, t)n*{z', t 1 )] = ahv s K T S(z -z',t- t'), 

where h is Planck's constant, v s is the optical frequency, and 
Kt is the phonon occupancy factor. In Table |U we provide 
parameter values for the system components. 



Second-order dispersion P2 
Loss a 

Nonlinear coefficient 7 
Center carrier frequency v s 
Phonon occupancy factor Kj< 



-21.668 ps^/km 
4.605 x 1CT 5 m- 
1.27 W^km" 1 
193.41 THz 
1.13 



Note that the scalar equation (fTJ — whose numerical solution 
is used to generate all of the results of this paper — governs 
propagation of waveforms in a single polarization mode. 
The achievable rates for a dual-polarized transmission system 
would be approximately (but slightly less than) twice as 
large as for the single polarization system considered here, 
but a more complicated vector version of (|T), taking into 
account the effects of fiber birefringence and coupling between 
the polarization modes as well as the stochastic nature of 
polarization mode dispersion, would need to be considered. 



C. Digital Backpropagation 

Throughout propagation over an optical fiber, stochastic 
effects (noise), linear effects (dispersion) and nonlinear ef- 
fects (Kerr nonlinearity) interact, and — even in the absence 
of noise — solving the GNLS equation requires numerical 
techniques. On the other hand, in the absence of noise, the 
system is invertible, i.e., the transmitted signal A(0, t) can be 
recovered from the received signal A(NLa) by inverting the 
channel. When the channel is inverted by digital signal pro- 
cessing, we say the receiver performs digital backpropagation. 

The most commonly used numerical method to solve the 
GNLS equation is the split-step Fourier method iTPTl . fl4l . 
The basic idea is to divide the total fiber length into short 
segments, then to consider each segment as the concatenation 
of (separable) nonlinear and linear transforms (for distributed 
amplification, an additive noise is added after the linear step). 
In the following, we briefly review the split-step Fourier 
method. For simplicity of the presentation, we ignore the 
effects of amplification, which can be incorporated into a 
numerical solver in an obvious manner. 

For a known A(z = zo,t), the split-step Fourier method 
calculates A(z = zq + h, t) as follows. First, in the absence 
of linear effects, the GNLS equation has the form, 

— = j 7 A 2 A, 
oz 

with solution, 

A(z = Zo + h,t) = A(z = z ,t)eiq>(jj\A(z = z ,t)\ 2 h). 

We now use this solution as the input to the the linear step, 
i.e., let 

A(z = z ,t) = A(z = z ,t)cxp(jj\A(z = z ,t)\ 2 h) 

be the input to the linear step. The linear form of the GNLS 
equation is 

dA a 3 f3 2 d 2 A 
dz 2 2 dt 2 ' 
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which can be efficiently solved in the frequency domain. 
Defining 

1 f°° - 

A ( z >^ = 2^ A(z,ui)exp(juit)dui, 



LPF 



it can be shown that 



,/3 2 , 2 a 



A(z = z + h,u) = A(z = z ,w)exp N j— w - — J h 

(2) 

Putting this together, we have 

A(z = Zo + h,t) = F- 1 {F = z , i)} 

-!>)}■ 

where T is the Fourier transform operator. 

Digital backpropagation is then accomplished by the split- 
step Fourier method, using a negative step-size h. Note that, 
in general, A(z, t) is the complex envelope of a multi-channel 
optical signal. It follows that full compensation of channel 
impairments — even if only a single channel is of interest 
to the receiver — requires backpropagation to be performed 
on the mM/fi-channel signal, since nonlinearity induces in- 
teraction between signal components at non-overlapping fre- 
quencies. However, in practice, receivers operate on a per- 
channel basis. Even if a multi-channel receiver were available, 
co -propagating channels may be optically-routed in or out 
throughout transmission, and thus channels that have co- 
propagated with the desired channel may not even be available 
at the receiver (and those channels that are available may not 
have co-propagated with the desired channel). Therefore, we 
consider single-channel backpropagation, in which the receiver 
first extracts the channel of interest from A(z = L,t) (via a 
bandpass filter), and performs digital backpropagation on the 
corresponding signal. 

III. Achievable Rates 

Although many current state-of-the-art systems include 
some form of electronic dispersion compensation (i.e., equal- 
ization) in the receiver, digital backpropagation is significantly 
more computationally intensive, since many steps — each of 
which has roughly the complexity of a standard equalization 
scheme — of the split-step Fourier method are required to 
accurately compensate the nonlinear effects. 

In this section we compare the achievable information rates 
when (only) linear equalization is performed to the achiev- 
able rates of a system that performs digital backpropagation. 
Furthermore, the resulting capacity estimates serve as upper 
bounds on the performance of a coded modulation system, the 
design of which we consider in Section ITVl 

A. Memoryless Capacity Estimation 

In (TJ, Essiambre et al. present an estimate of the informa- 
tion theoretic capacity of optical fiber networks. In this section, 
we review their technique, which we will make use of in the 
following. 
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Fig. 1. System model for memoryless capacity evaluation 



1) Transmitter: We consider a system that employs pulse- 
amplitude modulation (PAM) with (orthonormal) sine pulses. 
That is, the transmitted signal (corresponding to the baseband 
representation of the Z-th channel) is of the form 



x i(t) = E 7^ sinc 



where sinc(6>) 



fc=- 

sin tt6 
tt9 ■ 



t - kT s 



The 



>k,l 



are elements of a discrete- 



amplitude continuous-phase input constellation A4, i.e, for N 
rings, G [0, 2%), and r > 0, 

M = {m ■ r exp (j0) \m e {1, 2, . . . ,N}} . 

Each ring is assumed equiprobable, and for a given ring, the 
phase distribution is uniform. This choice of constellation is 
motivated by the fact that the channel represented by the 
GNLS equation can be argued to be statistically rotationally in- 
variant (i.e., for a channel with conditional distribution f(y\x), 
f(y\x ) = /(yexp (j0) \x exp (j0)) for e [0, 2tt)) and thus 
points on the same ring can be considered "equivalent", which 
reduces the computational requirements in characterizing the 
channel. Furthermore, it is well known that, for sufficiently 
many rings, the Shannon Limit of the AWGN channel can be 
closely approached, and one would expect this to be true also 
for the non-AWGN channel considered here. 

In the general case of a multi-channel system having 2B + 1 
channels with a channel spacing 1/T S Hz, the input to the fiber 
has the form 



A(z = 0,t) = 



OO 

E 

k— — oo I— 



E 



9JU 



IT. 



i sine 



t - kT s 



oJ 27Tlt/T s 



2) Receiver: By convention, the channel of interest (COI) 
is assumed to correspond to ( = 0. From the channel 
output A(L, t), the (baseband) digital coherent optical receiver 
extracts the COI via an ideal low-pass filter, and the corre- 
sponding signal is sampled at the rate 1/T S . The resulting 
discrete-time signal is then compensated by digital signal 
processing, i.e., backpropagation (BP) or linear equalization 
(EQ), providing estimates <j>k,o °f the transmitted symbols 
4>k.o, as illustrated in Fig Q] 

3) Channel Model: In order to facilitate the capacity es- 
timation, the discrete-time channel is assumed to be memo- 
ryless, i.e., it is assumed that backpropagation removes any 
dependence (introduced by the channel) between received 
symbols. The (memoryless) conditional distribution of the 
channel is estimated from numerical simulations. 

Since the channel is statistically rotationally invariant, ob- 
servations of transmitted points from the same ring are first 
'back-rotated' to the real axis, as illustrated in Fig. |2] The 
back-rotated points are represented by <j>k.i, 

4>k,l = 4>k,l exp (-j(<&XPM + ^<t>k,l)) , 
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Fig. 2. Channel outputs for a fixed-ring input, and back-rotated outputs. 



where $xpm is a constant (input-independent) phase rotation 
contributed by cross-phase modulation (XPM). 

Next, for each i and a fixed I (the channel of interest), we 
calculate the mean /i; and covariance matrix fij (of the real and 
imaginary components) of those 4>k,i corresponding to the i-th 
ring, and model the distribution of those ifikj by Af(/J.i, f2j). 
Finally, from the rotational invariance of the channel, the 
channel is modeled as 

/ (y\x = r ■ i exp (j<p)) ~ Af(pi exp (j</>) , fy), 

where the (constant) phase rotation due to $xpm is ignored, 
since it can be canceled in the receiver. Note that this model 
reduces to an additive 'noise' model when fa = (r ■ i, 0), but 
in general this relationship need not be true. 

4) Capacity Estimation: The mutual information of the 
memoryless channel is 

f(y\x) 



I(X;Y) 



f{x, y) log 



■ dx dy, 



where f(x) represents the input distribution on A4 with 
equiprobable rings and a uniform phase distribution, which 
provides an estimate of the capacity of an optically-routed 
fiber-optic communication system. 

5) Signaling Parameters: In Table [TT] we provide the pa- 
rameters of the signaling scheme, to be used throughout the 
remainder of this work. In general, further increasing the 
number of simulated channels has a negligible effect on the 
capacity estimates. 

TABLE II 
Signaling parameter values 



Baud rate 1 JT S 100 GHz 

Channel bandwidth W 101 GHz 

Number of rings N 64 

Number of channels 2B + 1 = 5 



B. Results 

In Fig. [3] we present the achievable spectral efficiencies. 
We consider systems of length L = 500, 1000 and 2000 km. 
The signal-to-noise ratio (SNR) is defined as 

P 

SNR: 




T L=2000 km 


Eq 


only 


L-2000 km 


BP 




— A— L-1000 km 


Eq 


only 


— H — L=l 00 km 


BP 




— ►— L=500 km, 


Eq. 


only 


— 8 — L=500 km, 


BP 




Shannon Limit 


(AWGN) 



25 

SNR (dB) 



Fig. 3. (Theoretically) achievable spectral efficiencies for BP and EQ at 
different transmission lengths. Also shown (by the isolated symbols) are the 
spectral efficiencies achieved by the staircase-coded systems described in 
Table [W] 



where P is the average transmitter power, W is the bandwidth 
occupied by a single channel, and A^ase = Lahv s KT is the 
power spectral density of the noise. In contrast to conventional 
linear Gaussian channel models, A^ase is fixed by the choices 
of L and the amplification technique. Therefore, for a fixed 
system, the SNR can be increased only by increasing the input 
power. 

For L = 2000 km, the peak spectral efficiency is ap- 
proximately 6.45 bits/s/Hz per polarization when only linear 
equalization is performed, but increases to approximately 7.2 
bits/s/Hz per polarization for digital backpropagation. For 
L = 1000 km, the peak spectral efficiency is approximately 
7.4 bits/s/Hz per polarization when only linear equalization 
is performed, but increases to approximately 8.1 bits/s/Hz 
per polarization for digital backpropagation. Finally, for L = 
500 km, the peak spectral efficiency is approximately 8.45 
bits/s/Hz per polarization when only linear equalization is 
performed, but increases to approximately 9.0 bits/s/Hz per 
polarization for digital backpropagation. 

For the cases considered, digital backpropagation in- 
creases the achievable spectral efficiencies — relative to linear 
equalization — by 0.55 to 0.75 bits/s/Hz per polarization. From 
the standpoint of achievable rates, the channel is "nearly" 
linear for most input powers of interest, in the sense that 
linear equalization achieves rates that closely approach those 
achievable via backpropagation. Furthermore, even when the 
input power is such that the achievable rate is maximized, 
the distortion introduced by the channel is well-modeled as 
AWGN, and thus classical coding methods ought to provide 
near-capacity reliable communications. However, due to the 
extremely high per-channel data rates of fiber-optic systems, 
implementation challenges arise. In the following, we propose 
a pragmatic coded modulation system — based on staircase 
codes — that provides excellent performance and an efficient 
high-speed implementation. 
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IV. A Pragmatic Coded-Modulation Scheme 

Although staircase codes are binary error-correcting codes 
with a syndrome-based decoding algorithm, they can be 
adapted — via known techniques — to provide error-correction 
in high-spectral-efficiency communication systems, while 
maintaining their efficient decoding architecture. In the fol- 
lowing, we first review these techniques, then we provide 
the parameters of staircase-coded systems and present their 
performance. 

A. Coding 

For high-spectral-efficiency communications, the set of 
channel input symbols (i.e., the modulation constellation) must 
be sufficiently large, and coding is required on the resulting 
non-binary input alphabet. At first glance, this would seem to 
require the design of error-correction codes over non-binary 
alphabets, with a decoding algorithm that accounts for the 
distance metric implied by the underlying channel. Indeed, 
this 'direct' approach provides motivation for trellis-coded 
modulation luTl . in which the code is designed to optimize the 
minimum Euclidean distance between transmitted sequences. 

Alternatively, by considering the set of channels induced 
by the bit-labels of the constellation points, coded modulation 
via binary codes can be applied with — in principle — no loss 
of optimality. To see this, consider a 2 M -point constellation 
A, for which each symbol is labeled with a unique binary M- 
tuple (pi, 62, • • • , buf)- For a channel with input X E A, and 
an output Y, the capacity of the resulting channel is I(X; Y) 
(maximized over the input distribution p(x)), which can be 
expanded by the chain rule of mutual information: 

I(X;Y) = I(bi,b 2 ,...,b M ;Y) 

= I(h;Y) +I(b 2 ;Y\bi) + ■■■ + 

I(b M ;Y\bi,b 2 ,...,b M -i) (3) 

Note that each term (i.e., the sub-channels) in the expan- 
sion defines a binary-input channel, for which binary error- 
correction codes — such as staircase codes — can be applied; 
this approach is referred to as multi-level coding (MLC) lfl6l . 
Furthermore, if a capacity-approaching code is applied to each 
sub-channel, then the capacity of the modulation scheme is 
achieved, that is, there is no loss in optimality in applying 
binary coding to each sub-channel. However, from (|3j, it is 
implied that decoding is performed in stages, since decoded 
bits from lower-indexed levels provide side information for 
decoding higher levels; the resulting decoding architecture is 
referred to as a multi-stage decoder. 

Note that the multi-stage architecture introduces decoding 
latency to the higher levels, requires memory to store channel 
outputs prior to decoding (since outputs are 'held' until 
decoded bits from the lower levels are available), and requires 
an individual code for each sub-channel. Clearly, the latency 
and memory issues can be eliminated simply by ignoring the 
conditioning in (|3), and the resulting system has capacity 



C 



PID 



I(bi;Y) + I(b 2 ;Y) + ■ ■ ■ + I(b M ;Y), 



000000 000001 0000 11 000010 



001000 001001 001011 001010 



011000 011001 011011 011010 



OHM 1001)1 0: 



110000 110001 



101000 101001 



100000 100001 100011 100010 



000110 000111 000101 000100 
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010101 010100 



11 110101 110100 



1101 111100 



101101 101100 



oono loom 100101 100100 



Fig. 4. A Gray-labeled 64-QAM constellation. 



where PID stands for "parallel independent decoding". How- 
ever, even when capacity-achieving codes (i.e., with rates 



Fig. 5. A mixed-labeled 256-QAM constellation, where Ge4 represents a 
Gray-labeled 64-QAM constellation. 



I(pi; Y)) are applied to each sub-channel, Cpid may be 
significantly less than I(X; Y). Note that the capacities of the 
individual bit-channels depend on the constellation labeling; 
for MLC their overall sum is fixed, regardless of the labeling, 
but for PID their sum (i.e., Cpid) depends on the labeling. In 
fact, for Gray-labeling^| (see Fig. |4), the difference between 
Cpid and I(X;Y) essentially vanishes, as shown in ifTTl . 
Furthermore, even though the capacities of the individual sub- 
channels are not identical, a single binary error-correcting code 
(whose rate is the average of the bit-channel rates) provides 
near-capacity performance, which addresses the third issue 
with MLC; this approach is referred to as bit-interleaved coded 
modulation (BICM). 

B. Shaping 

Implicit in the definition of channel capacity is an optimiza- 
tion over the input alphabet of the channel. For example, the 
optimal input distribution for the additive white Gaussian noise 
channel is itself Gaussian. Indeed, the ring-like constellations 
used in Section [ill] provide shaping gain relative to a QAM- 
like constellation. In order to approach the capacity estimates 
of the fiber optic channel, shaping is essential in any coded 
modulation scheme. 

'A Gray-labeling has the property that the binary M -tuples of nearest 
neighbor constellation points differ in only a single position 
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Fig. 6. The coded modulation system with shaping. 
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TABLE III 

Achievable Rates Per Polarization for Pragmatic 
Coded-Modulation System 



Fiber System 


K 


favg 


Pin 

(dBm) 


Ip 
(bits/s/Hz) 


L = 500 km, EQ 


8 


1.61 x 10 - * 


-6 


8.05 


L = 500 km, BP 


8 


3.52 x 10~ 3 


-4 


8.73 


L = 1000 km, EQ 


6 


3.88 x 10~ 3 


-6 


6.78 


L = 1000 km, BP 


8 


2.22 x 10~ 2 


-4 


7.77 


L = 2000 km, EQ 


6 


2.52 x 10" 2 


-6 


5.98 


L = 2000 km, BP 


6 


5.16 x 10~ 3 


-4 


6.72 



In the following, we describe an adaptation of trellis 
shaping |fl8ll to a bit-interleaved coded modulation system. 
Consider the bit-labeling in Fig. [5] For each point in a given 
quadrant, the two most significant bits are the same; we refer 
to these two bits as the shaping bits. Furthermore, by the chain 
rule of mutual information, we have 

I{X; Y) = b 2 , b Kl b K+1 ,b K+2] Y) 
= I(b 1 ,b 2 ,...,b K ;Y) 

S v ' 

KB. 

+ I(b K+1 , b K+2 ; Y\h, b 2 , ...,b K ), (4) 

i 

which provides a 'two' -level (i.e., an MLC scheme with two 
generalized levels) interpretation of the proposed system, with 
M = K + 2. The first term in (0]i represents the lowest level, 
to which error-correction coding is applied; for the reasons 
stated previously, we will use bit-interleaved coded modulation 
at this level. If the rate of the error-correcting code is R, 
then this term communicates K ■ R bits per symbol. The 
second term, the upper-level of the pseudo-MLC scheme, is 
responsible for providing shaping. In trellis shaping, this is 
provided by communicating — via (&a'+i, bn+i) — a single bit 
per symbol, while using a Viterbi-based shaping algorithm to 
select the remaining bit (of freedom) to produce a sequence of 
symbols with a (nearly) bi-dimensional Gaussian distribution. 
Intuitively, the bi-dimensional Gaussian distribution results 
from two facts: the Viterbi search selects the signal path 
that minimizes energy, and for a fixed entropy, the Gaussian 
distribution is the minimum energy distribution. 

As in flin . the Viterbi algorithm operates on the trellis of a 
four-state convolutional code C\j with generator matrix Gjj = 
[1 + D 2 , 1 + D + D 2 ] and syndrome-former matrix Hy = 
[1 + D + D 2 , 1 + D 2 ] T . The overall operation of the system 
is as illustrated in Fig. [6] 

C. Pragmatic Coded-Modulation via Staircase Codes 

To further reduce the complexity of the coded modulation 
system, we focus on systems for which the error-correcting 
code (at the lowest level) is decoded by a hard-decision 
decoder that receives hard decisions from the channel. That is, 
the demodulator in Fig. [6] outputs K bits (the bi, b 2) . . . , 6jf 
corresponding to the constellation point closest to the received 
symbol) to the FEC decoder for every received symbol; we 
assume coding is applied to these bits via BICM, and refer to 
such a system as a "pragmatic" coded-modulation system. 



In a manner similar to that applied for the capacity esti- 
mates in Fig. [3] the achievable rates of the pragmatic coded- 
modulation system can be estimated via numerical simulations. 
Since BICM and hard-decision quantization are performed at 
the lowest level, the capacity of the resulting level is 

K(l-H 2 ( Pa . vg )), 

where p avg is the average error rate of the received bits at the 
lowest level, and H 2 (x) = — x\og 2 x — (1 — x)log 2 (l — x) 
is the binary entropy function. Furthermore, the highest level 
communicates exactly 1 bit of information per symbol, and 
the maximum achievable information rate for the pragmatic 
system is thus 

Ip = l + JC(l-ff 2 (p avg )). 

In Table [Till the estimated values of Ip are presented, based 
on numerical simulations of the systems; in each case, K and 
the average input power Pi n are optimized to maximize Ip. 

Note that for L = 2000 km, the pragmatic coded mod- 
ulations system has a capacity within 0.47 bits/s/Hz per 
polarization of the peak spectral efficiency when only linear 
equalization is performed, and within 0.48 bits/s/Hz per po- 
larization for digital backpropagation. For L = 1000 km, the 
pragmatic coded modulations system has a capacity within 
0.62 bits/s/Hz per polarization of the peak spectral efficiency 
when only linear equalization is performed, and within 0.43 
bits/s/Hz per polarization for digital backpropagation. Finally, 
for L = 500 km, the pragmatic coded modulations system has 
a capacity within 0.40 bits/s/Hz per polarization of the peak 
spectral efficiency when only linear equalization is performed, 
and within 0.27 bits/s/Hz per polarization for digital backprop- 
agation. In each case, the dominant contribution to the gap in 
performance is a result of the hard quantization applied at the 
lowest level of the (two-level) coded system. Even though the 
hard quantization scheme leads to some loss in performance, 
it is directly compatible with the syndrome-based decoding of 
staircase codes. 

We now consider the design of staircase codes for use 
in the pragmatic coded system. In a G.709-compliant 
staircase code was presented, with R = 239/255, suitable for 
providing error-correction on a binary symmetric channel with 
p < 4.8 x 10 -3 . This code is thus suitable for providing error- 
correction for the linearly-equalized system with L = 1000 km 
and the digitally-backpropagated system with L = 500 km. 
For the other systems, we designed new staircase codes, the 
parameters of which — including the net coding gain (NCG) — 
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Fig. 7. Performance curves for the staircase codes in Table II VI 



TABLE IV 

Staircase Codes for Pragmatic Coded Systems 

NCG Spec. Eff. 



Fiber System 


m 


t 


R 


(dB) 


(bits/s/Hz) 


L = 500 km, EQ 


190 


4 


77/95 


10.47 


7.48 


L = 500 km, BP 


255 


3 


239/255 


9.41 


8.50 


L = 1000 km, EQ 


255 


3 


239/255 


9.41 


6.62 


L = 1000 km, BP 


111 


1 


3/4 


10.68 


7.00 


L = 2000 km, EQ 


120 


1 


11/15 


10.62 


5.40 


L = 2000 km, BP 


628 


4 


146/157 


9.50 


6.58 



We proposed a pragmatic coded modulation system that 
incorporates signal shaping and exploits the excellent 
performance and efficient high-speed decoding architecture of 
staircase codes. Reliable communication within 0.62 bits/s/Hz 
per polarization of the estimated capacity of a system with 
L = 2000 km is provided by the proposed system, with an 
error floor below 10~ 20 . 



are provided in Table |rV[ the terminology used to describe the 
codes follows that of Section Hl-AI 

In each case, the length of the (mother) BCH component 
code is the smallest 2™ — 1 that is greater than or equal to 2m. 

In Fig. [7] the bit-error-rate curves are plotted. Since these 
curves (other than the G.709-compliant staircase code) were 
computed without a hardware implementation, we were only 
able to obtain results to approximately 10 -10 . By the error 
floor estimation methods outlined in j9J (with p set to the 
average of the sub-channel error rates), none of the systems 
have error floors above 10~ 20 . Thus, extrapolating the curves 
to 10^ 15 , each code has been designed to provide an output 
error rate of better than 10~ 15 at the input error rate induced 
by its corresponding system. 

In Fig. [3j the performance of the staircase coded systems 
is plotted (the filled symbols), in addition to the estimated 
capacity curves (the unfilled symbols). For L = 2000 km, 
the system performs within 1.05 bits/s/Hz per polarization 
of the peak spectral efficiency when only linear equalization 
is performed, and within 0.62 bits/s/Hz per polarization for 
digital backpropagation. For L = 1000 km, the system 
performs within 0.78 bits/s/Hz per polarization of the peak 
spectral efficiency when only linear equalization is performed, 
and within 1.2 bits/s/Hz per polarization for digital backprop- 
agation. Finally, for L — 500 km, the system performs within 
0.97 bits/s/Hz per polarization of the peak spectral efficiency 
when only linear equalization is performed, and within 0.50 
bits/s/Hz per polarization for digital backpropagation. 

Note that the performance gap increases as the rate of 
the staircase code (and corresponding sub-channel capacity) 
decreases, since staircase codes perform closest to capacity 
at high rates. The performance of those systems could be 
improved by choosing a multi-dimensional constellation that 
induces a higher rate (average) sub-channel. 

V. Conclusions 

We showed that digital backpropagation increases 
the achievable spectral efficiencies — relative to linear 
equalization — by 0.55 to 0.75 bits/s/Hz per polarization. 
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